Claims 



What is claimed is: 



1. A method of providing speaker recognition, said method comprising the steps 



of: 



providing a model corresponding to a target speaker, the model being resolved 
into at least one frame and at least one level of phonetic detail; 

receiving an idemity claim; 

ascertaining whfether the identity claim corresponds to the target speaker model; 
said ascertaining step comprising the steps of 



determining, for each frame and each level of phonetic detail of the target 
speak^ model, a non-interpolated likelihood value; and 

resolv ng the at least one likelihood value to obtain a hkehhood score. 

2. The methc d according to Claim 1, wherein, for each frame and each level of 
phonetic detail, the r on-interpolated likelihood value is a maximum likelihood value. 
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3. The method according to Claim 2, wherein said step of resolving the at least 
one likelihood value comprises averaging the at least one likelihood value. 

4. The method according to Olaim 3, wherein the likelihood value is determined 
via the following general equation: / 

^ / 1=1 (=1 

wherein b_{i,j(i,t)} corresponds to grain-specific weights that satisfy 

and further wherein: / 
iS' is the likelihood score; 

t/ is a test utterance, comprising T frames U\...,Ui\ 

M(iJ) is a speaker/model, with 1 <i <L levels of detail and with 1 <y < K(i) units 
on the /-th level; and / 

P(ut\M(iJ)) is the probability that a frame Ut corresponds to a speaker model unit j 
on the /-th level of phonetic detail of the speaker model. 
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5. The method according t A, Claim 4, wherein the likeUhood score is determined 
by the following equation: / 

6. The method according to Claim 1, wherein the at least one level of phonetic 

5 detail comprises at least one of the following: a global level; a phonemic level and a sub- 
phonemic level. / 

7. The method according to Claim 6, wherein the at least one level of phonetic 
detail comprises all of thJfollowing three levels: a global level; a phonemic level and a 
sub-phonemic level. / 

10 8. The method/according to Claim 7, wherein said step of providing a model 

corresponding to a target speaker comprises creating said target speaker model on the 
basis of training utterances and providing labeling information for each frame. 

9. The method according to Claim 1, wherein said ascertaining step further 
comprises accepting or rejecting the identity claim. 
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10. The method according to Claim 9, wherein said st^of accepting or rejecting 
comprises comparing a quantity based on the likelihood^^ore to a predetermined 
threshold value. 

1 1 . The method according to Claim 1^, further comprising the steps of: 

5 providing at least one model cqrfesponding to at least one background speaker; 

and 

determining the quantity based on the likelihood score via employing the at least 
one background speaker mdael. 

12. The method^ according to Claim 1 1, wherein said step of determining the 
10 quantity based on tt^ likelihood comprises determining a log-likelihood ratio based on the 

likelihood score. 

13. The method according t;3 Claim 12, wherein the log-likelihood ratio is 
determined/by the following equal 



1 



L = S{U\M)~j;Y^S{U\BG,y^ 

(=1 



15 wherein: 
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L is the log-likelihood ratio; / 
S is the likelihood score; / 
M denotes the target speaker model; and 
BGi denotes the /-th background model. 

14. An apparatus for of p/oviding speaker recognition, said apparatus comprising: 

a target speaker model generator for generating a model corresponding to a target 
speaker, the model being resolved into at least one frame and at least one level of phonetic 
detail; / 

a receiving arrangement for receiving an identity claim; 

a decision arrangement for ascertaining whether the identity claim corresponds to 
the target speaker model; 

said decision arrangement being adapted to: 

determine, for each frame and each level of phonetic detail of the target 
speaker model, a non-interpolated likelihood value; and 
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resolve the at least one likelihood value to obtain a likelihood score. 

15. The apparatus according to Claim 14, wherein, for each frame and each level 
of phonetic detail, the non-iiuerpolated likelihood value is a maximum likelihood value. 

16. The apparatus according to Claim 15, wherein said decision arrangement is 
adapted to resolve the at l^ast one likelihood value via averaging the at least one 
likelihood value. 

17. The apparatus according to Claim 16, wherein the likelihood value is 
determined via the following general equation: 



^ 1=1 t=\ 



wherein b {i,j(i,t)} corresponds to grain- specific weights that satisfy 



j=l ;=1 



and further wherein: 



S is the likelihood score 
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t/ is a test utterance, comprising 7" frames Wk/, Wt; 

is a speaker model, with 1 < / </L levels of detail and with 1 <7 < K(i) units 
on the /-th level; and / 

P(ut\M(iJ)) is the probabilityrhat a frame Ut corresponds to a speaker model unit j 
on the /-th level of phonetic detail of the speaker model. 

18. The apparatus according to Claim 17, wherein the likelihood score is 
determined by the following equation: 

19. The^apparatus according to Claim 14, wherein the at least one level of 
phonetic detail comprises at least one of the following: a global level; a phonemic level 
and a sub-nnonemic level 

20. The apparatus according to Claim 19, wherein the at least one level of 
phonet/c detail comprises all of the following three levels: a global level; a phonemic level 
and a/sub-phonemic level. 
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21. The apparatus according to Claim 20, wherdn said target speaker model 
generator is adapted to generate said target speaken^odel on the basis of training 
utterances and providing labeling information ipr each frame. 

22. The apparatus according to Claim 14, wherein said decision arrangement is 
5 further adapted to accept or reject theadentity claim. 

23. The apparatus according to Claim 22, wherein said decision arrangement is 
adapted to accept or reject thef identity claim via comparing a quantity based on the 
likelihood score to a predetermined threshold value. 

24. The appar^us according to Claim 23, further comprising: 

10 a backgrouriil speaker model generator for providing at least one model 

corresponding to/at least one background speaker; 

said decision arrangement being adapted to determine the quantity based on the 
Ukelihood score via employing the at least one background speaker model. 

25/ The apparatus according to Claim 24, wherein said decision arrangement is 
1 5 adapted jfo determine the quantity based on the likelihood via determining a log-likelihood 
ratio based on the likelihood score. 
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26. The apparatus according to Claim 25, wherein the log-likelihood ratio is 
determined by the foUowin/equation: 

L = S(U\M)-l;j;^S(U\BG,); 

wherein: I 
5 L is the log-likelihood ratio; 

S is the likelihood score; 
M denotes the/target speaker model; and 
BGi denotes ihe /-th background model. 

27. A program storage device readable by machine, tangibly embodying a 

10 program of instrucmons executable by the machine to perform method steps for providing 
speaker recognition, said method comprising the steps of 

providing a model corresponding to a target speaker, the model being resolved 
into at least one frame and at least one level of phonetic detail; 

receiving an identity claim; 
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